# Replication Data For "Estimating Heterogeneous Causal Effects of High-Dimensional Treatments: Application to Conjoint Analysis"

Version 1.0; 14 January 2025

Authors: Max Goplerud, Kosuke Imai, Nicole E. Pashley

## Overview

This `README.md` explains the replication archive for the analyses conducted in
our paper. To use `FactorHet` for your own data, please skip to the "Software"
section for installation instructions.

## File Listing

- `AOAS_repeat`:
  - `out_[0-9]+.RDS`: Output used for supplementary analyses.
- `AOAS_simulation`
  - `out_[0-9]+.tar.gz`: Output used for the simulations.
- `code`:
  - `bs_true_values.RDS`: True values used in the simulations
  `hh_original_replication.R` provides more information on this file.
  - `combine_repeat.R`: Script used to combine the output in `AOAS_repeat`
  - `create_figures_cjbart.R`: Script used to create the figures for comparisons 
  against `cjbart`.
  - `create_figures_hh.R`: Script used to create figures using the Hainmueller and Hopkins (2015) immigration
  data.
  - `create_figures_sims.R`: Script used to create figures from the simulations.
  - `FH_repeat.submit` and `FH_sims.submit`: HTCondor files used to run the 
  simulations and repeated split-samples.
  - `hh_original_replication.R`: Script to create Figure 1 in the paper that 
  produces results mirroring the original ones from Hainmueller and Hopkins (2015).
  - `multicore_combine.R`: Multicore script used to pull together the simulation
  results in `AOAS_simulation`
  - `prep_HH_data.R`: Script to prepare the immigration data and create 
  `packaged_data.RDS`.
  - `packaged_data.RDS`: The cleaned and formatted data from Hainmueller and
  Hopkins (2015) that is used in our analysis. *Please note*: This is not provided on 
  Dataverse directly but can be created from other dataverse repositories as 
  outlined below ("Running Empirical Analyses").
  - `run_main.R`: Script to run the models with K=1,2,3,4, minimizing either the
  BIC or AIC.
  - `run_prepare_sims.R`: Script to run the repeated split-sample analyses.
  - `run_sims.R`: Script to run the simulations.
- `data_from_hh`: 
  - `chosen.txt`: Estimates from Hainmueller and Hopkins (2015). 
  - `repdata.dta`: Replication data from Hainmueller and Hopkins (2015). *Please note*:
  This is not provided on Dataverse directly but be downloaded as explained in 
  "Running Empirical Analyses".
- `FH_AOAS_010825.sif`: Apptainer container used to run all scripts on the HPC 
and main models.
- `figures`: Figures and tables used in the main text. All objects in this 
folder can be produced by running `bash create_figures.sh`.
- `final_output`:
  - `final_AME.RDS`, `final_cjoint.RDS`, `final_HTE.RDS`, 
  `final_HTE_binscatter.RDS`, `final_HTE_by_AME.RDS`, `final_OOS.RDS`, 
  `final_posterior.RDS`, `final_sos.RDS`: Processed output from the simulations
  to be used in creating the figures.
  - `repeat_final_output.RDS`: Processed output from `AOAS_repeat` to be used in 
  creating the figures.
  - All objects in this folder can be created using `bash prep_data_for_figures.sh`.
- `main_models`
  - `out_BIC.RDS`: The main models used for the tables and figures in the main 
  text. This is produced by running `run_main.R`
  - `out_AIC.RDS`: An alternative version that minimizes the AIC instead of the 
  BIC; it is used for a supplementary table in the appendix.
- `prep_data_for_figures.sh`: This script prepares the raw data into a form that 
  can be used to make the figures.
- `create_figures.sh`: This script creates the figures and tables in the main 
  text and supporting information.
  
## Software

To install the software, you may use the version on CRAN or GitHub:

```
# CRAN version
install.packages('FactorHet')
# GitHub version
remotes::install_github('mgoplerud/FactorHet')
```

All analyses conducted on the HPC (i.e., all estimation of models) use an
Apptainer container that is provided in the replication archive. This uses a
version of the `FactorHet` package found here (https://github.com/mgoplerud/FactorHet/commit/8564d7f) 
that is materially identical to version 1.0.0 on CRAN with some documentation 
updates and minor adjustments for CRAN submission.

## Running Empirical Analysis

Our empirical analysis focuses on data from Hainmueller and Hopkins (2015). It 
is available on dataverse: 

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/25505

To fully replicate our results, please download their main data (`repdata.dta`), 
place it in the `data_from_hh` and run the following script to create a cleaned 
and "packaged" version (`packaged_data.RDS`) that is placed in the `code` folder
by default.

```
Rscript --vanilla code/prepare_hh_data.R
```

After doing so, all of the following scripts should run without issue.

The main models in our analysis can be run as follows

```
library(FactorHet)
library(glue)

repdata <- readRDS('packaged_data.RDS')
fac<-c("Ed", "Gender", "Country", "Reason", "Job", "Exp", "Plans", "Trips", "Lang")
hhy_fmla <- as.formula(glue('Chosen_Immigrant ~ {paste(fac, collapse = " + ")}'))
int_hhy <- update.formula(hhy_fmla, '. ~ . + (Job + Ed + Reason) * Country +Job*Ed')

out <- list()
for (K in 2:3){
  
  set.seed(78712)
  # Run model
  est <- FactorHet_mbo(formula = int_hhy, K = K,
    design = repdata, 
    moderator =  ~ scale_hisp_prej_flip + party_ID + 
      ppEducat + census_div + ppEthm,
    group = ~ CaseID, task = ~ contest_no, choice_order= ~ choice_id)
  out[[K]] <- est
}
```

The script `run_main.R` contains code to do this, although it runs models with
K=1,2,3,4 and minimizing the AIC (instead of the default of the BIC) for
supplementary analyses. The above script is thus a simplified version of this
code.

## Creating Tables and Figures

To create the tables and figures in the manuscript, running the following bash
file runs the required `R` scripts. Doing so requires the `final_output` and
`main_models` folders to be downloaded and will put the figures into the
`figures` folder.

```
bash create_figures.sh
```

## Simulations and HPC

The simulations are run on an HPC; we use the resource provided by the Open
Science Grid (OSG) that uses a HTCondor system. In case it is helpful, we
include the `.submit` files as well as the `.R` scripts that we used, although
file paths would need to be adjusted for your specific HPC.

The simulations are structured so that their raw output is contained in 500
`.tar.gz` files. We include these in `AOAS_simulations`. However, these must be
formatted and processed before they can be used directly. This is set up to use
10 cores to speed up the process.

One appendix also runs FactorHet many times using a split-sample approach on the
main immigration data. This is also run on the HPC and needs processing before
it can be used to create figures.

Both of these above processing commands can be run by using the bash script that
calls the relevant `.R` files.

```
bash prep_data_for_figures.sh
```

It creates a number of files that are placed in `final_output` folder.

## References

Hainmueller, Jens, 2014, "Replication data for: The Hidden American Immigration Consensus: A Conjoint Analysis of Attitudes toward Immigrants", https://doi.org/10.7910/DVN/25505, Harvard Dataverse, V2, UNF:5:tevlw4dhO6Evym2pA6vSZg== [fileUNF]

Hainmueller, J. and Hopkins, D.J., 2015. The hidden American immigration consensus: A conjoint analysis of attitudes toward immigrants. American journal of political science, 59(3), pp.529-548.